Automatic Discovery of Linguistic Patterns for Information Extraction
نویسندگان
چکیده
Information Extraction (IE) systems typically rely on extraction patterns encoding domain-specific knowledge. When matched against natural language texts, these patterns recognize with high accuracy information relevant to the extraction task. Adapting an IE system to a new extraction scenario entails devising a new collection of extraction patterns a time-consuming and expensive process. To overcome this obstacle, we have implemented in CICERO, our IE system, a pattern acquisition mechanism that combines lexicosemantic knowledge available from WordNet with syntactic information collected from training corpora. The open-domain nature of the knowledge encoded in WordNet grants portability of our approach across multiple extraction domains.
منابع مشابه
A Method for Extracting Causal Knowledge from Textual Databases
This paper describes the first phase of a project to develop a knowledge extraction and knowledge discovery system that extracts causal knowledge from a textual database automatically, and attempts to infer new causal relationships from the extracted information. The initial work is focused on developing an automatic method for identifying and extracting cause-effect information expressed in me...
متن کاملCombinaison d'approches pour l'extraction automatique d'événements (Automatic events extraction by combining multiple approaches) [in French]
Automatic events extraction by combining multiple approaches In this paper, we present an automatic system for extracting events based on the combination of two existing information extraction approaches : the first one is made of hand-crafted linguistic rules and the second one is based on an automatic learning of linguistic patterns. We have shown that this mixed approach leads to a significa...
متن کاملAcquisition of Linguistic Patterns for Knowledge-based Information Extraction
In this paper we present a new method of automatic acquisition of linguistic patterns for Information Extraction, as implemented in the CICERO system. Our approach combines lexico-semantic information available from the WordNet database with collocating data extracted from training corpora. Due to the open-domain nature of the WordNet information and the immediate availability of large collecti...
متن کاملA Domain-Independent Approach to IE Rule Development
A key element for the extraction of information in a natural language document is a set of shallow text analysis rules, which are typically based on pre-defined linguistic patterns. Current Information Extraction research aims at the automatic or semi-automatic acquisition of these rules. Within this research framework, we consider in this paper the potential for acquiring generic extraction pa...
متن کاملLexical Patterns or Dependency Patterns: Which Is Better for Hypernym Extraction?
We compare two different types of extraction patterns for automatically deriving semantic information from text: lexical patterns, built from words and word class information, and dependency patterns with syntactic information obtained from a full parser. We are particularly interested in whether the richer linguistic information provided by a parser allows for a better performance of subsequen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001